Combining Knowledge and CRF-Based Approach to Named Entity Recognition in Russian
نویسندگان
چکیده
Current machine-learning approaches to information extraction often include features based on large volumes of knowledge in form of gazetteers, word clusters, etc. In this paper we consider a CRF-based approach to Russian named entity recognition based on multiple lexicons. We test our system on the open Russian collection "Persons-1000" labeled with personal names. We additionally annotated this collection with names of organizations, media, locations, and geo-political entities and present the results of our experiments for one type of names (Persons) for comparison purposes, for three types (Persons, Organizations, and Locations), and ve types of names. We also compare two types of labeling schemes for Russian: IO-scheme and BIO-scheme
منابع مشابه
PAYMA: A Tagged Corpus of Persian Named Entities
The goal in the named entity recognition task is to classify proper nouns of a piece of text into classes such as person, location, and organization. Named entity recognition is an important preprocessing step in many natural language processing tasks such as question-answering and summarization. Although many research studies have been conducted in this area in English and the state-of-the-art...
متن کاملA Novel Approach to Conditional Random Field-based Named Entity Recognition using Persian Specific Features
Named Entity Recognition is an information extraction technique that identifies name entities in a text. Three popular methods have been conventionally used namely: rule-based, machine-learning-based and hybrid of them to extract named entities from a text. Machine-learning-based methods have good performance in the Persian language if they are trained with good features. To get good performanc...
متن کاملImprovement of Chemical Named Entity Recognition through Sentence-based Random Under-sampling and Classifier Combination
Chemical Named Entity Recognition (NER) is the basic step for consequent information extraction tasks such as named entity resolution, drug-drug interaction discovery, extraction of the names of the molecules and their properties. Improvement in the performance of such systems may affects the quality of the subsequent tasks. Chemical text from which data for named entity recognition is extracte...
متن کاملتشخیص اسامی اشخاص با استفاده از تزریق کلمههای نامزد اسم در میدانهای تصادفی شرطی برای زبان عربی
Named Entity Recognition and Extraction are very important tasks for discovering proper names including persons, locations, date, and time, inside electronic textual resources. Accurate named entity recognition system is an essential utility to resolve fundamental problems in question answering systems, summary extraction, information retrieval and extraction, machine translation, video interpr...
متن کاملApplication of a Hybrid Bi-LSTM-CRF model to the task of Russian Named Entity Recognition
Named Entity Recognition (NER) is one of the most common tasks of the natural language processing. The purpose of NER is to find and classify tokens in text documents into predefined categories called tags, such as person names, quantity expressions, percentage expressions, names of locations, organizations, as well as expression of time, currency and others. Although there is a number of appro...
متن کامل